Network graph evolution rule generation

ABSTRACT

A network&#39;s evolution is characterized by graph evolution rules. A graph that represents an evolutionary network is mined to identify evolutional patterns of the network, and graph evolution rules are generated using identified evolutional patterns. The generated graph evolution rules represent the evolutional patterns of the network.

FIELD OF THE DISCLOSURE

The present disclosure relates to characterization of a network'sevolution, such as a characterization of an evolution of a socialnetwork using graph evolution rules, and more particularly to mining agraph that represents an evolutionary network to identify evolutionalpatterns, and using identified evolutional patterns to generate graphevolution rules, which represent the evolutional patterns of thenetwork.

BACKGROUND

A social network has been described as a social structure ofrelationships or bonds between individuals, and/or groups ofindividuals, such as societies. The social structure can be said to bemade of individuals (or organizations) called “nodes,” which areconnected, or related, in some manner.

The social structure can be expressed using conventional network terms,such as nodes and edges, where nodes are entities, such as individuals,groups, etc., and edges, or ties, represent relationships betweenentities. Social networks can operate on many levels, such as familial,organizational, community, geographical, national, etc. Social networkscan be quite complex, and analysis of a social network, or networks, canconsume considerable computational resources.

SUMMARY

The present disclosure seeks to address failings in the art and toprovide a system, method and apparatus of analyzing temporal evolutionof a social network. In accordance with one or more embodiments, anetwork's evolution is characterized by graph evolution rules. A graphthat represents an evolutionary network is mined to identify evolutionalpatterns of the network, and graph evolution rules are generated usingidentified evolutional patterns. The generated graph evolution rulesrepresent the evolutional patterns of the network.

In accordance with one or more embodiments, a network, such as andwithout limitation a social network, is examined to identify patterns,which identify an evolution of the network. In accordance with one ormore embodiments, the identified patterns are expressed as one or morerules. The identified patterns can be used in any application in whichthe evolution of a network is of interest. By way of some non-limitingexamples, the patterns can be used as a predictor of the futureevolution of the network. The local concentration of patterns in acertain region of the graph might indicate a certain level ofevolution/progress in that area. The patterns, as they describe the waythe network evolves, can be used to discriminate among differentnetworks and the manner in which each network evolves. These and otherobservations that are made available via the patterns identified inaccordance with disclosed embodiments can be used in applications suchas: fraud detection, homeland security, evolution of terrorist networks,information propagation in social networks, group formation, grouprecommendation, friend recommendation, viral marketing, etc.

In accordance with one or more embodiments, a method is provided, themethod comprising collecting, by at least one processing unit, multiplegraphs corresponding to a network, the network evolving over time, eachgraph representing a snapshot reflecting a state of the network;forming, by the at least one processing unit, one graph by merging themultiple graphs representing the multiple snapshots of the network, theformed graph comprising a set of nodes and a set of edges, each edgeconnecting two nodes from the set of nodes and having a temporal label;mining, by the at least one processing unit, the graph to identifymultiple patterns, each pattern being a subgraph in the formed graph,each pattern having an associated support; selecting a pattern from theidentified patterns; identifying, by the at least one processing unit, achild pattern of the selected pattern, the identified child patternhaving a support that is at least equal to the support of the selectedpattern and missing a portion of pattern, the missing portion of thepattern including at least one edge of the pattern; creating, by the atleast one processing unit, a graph evolution rule, the rule indicatingthat any occurrence of the child pattern implies a correspondingoccurrence of the pattern, the corresponding occurrence of the patternbeing formed with the addition of the portion missing from the childpattern at a time indicated by the missing edge's temporal label. Inaccordance with at least one embodiment, the temporal label representsthe time of the snapshot in which the edge first appeared.

In accordance with at least one embodiment, a system is provided, whichsystem comprises at least one computing device. The at least onecomputing device comprising a graph merging component that collectsmultiple graphs corresponding to a network, the network evolving overtime, each graph representing a snapshot reflecting a state of thenetwork; and forms one graph by merging the multiple graphs representingthe multiple snapshots of the network, the formed graph comprising a setof nodes and a set of edges, each edge connecting two nodes from the setof nodes and having a temporal label; a mining component that mines theformed graph to identify multiple patterns, each pattern being asubgraph in the formed graph, each pattern having an associated support;and a graph evolution rule generator that selects a pattern from theidentified patterns; identifies a child pattern of the selected pattern,the identified child pattern having a support that is at least equal tothe support of the selected pattern and missing a portion of pattern,the missing portion of the pattern including at least one edge of thepattern; and creates a graph evolution rule, the rule indicating thatany occurrence of the child pattern implies a corresponding occurrenceof the pattern, the corresponding occurrence of the pattern being formedwith the addition of the portion missing from the child pattern at atime indicated by the missing edge's temporal label.

In accordance with at least one embodiment, a computer-readable mediumtangibly storing thereon computer-executable process steps, the processsteps comprising collecting multiple graphs corresponding to a network,the network evolving over time, each graph representing a snapshotreflecting a state of the network; forming one graph by merging themultiple graphs representing the multiple snapshots of the network, theformed graph comprising a set of nodes and a set of edges, each edgeconnecting two nodes from the set of nodes and having a temporal label;mining the graph to identify multiple patterns, each pattern being asubgraph in the formed graph, each pattern having an associated support;selecting a pattern from the identified patterns; identifying a childpattern of the selected pattern, the identified child pattern having asupport that is at least equal to the support of the selected patternand missing a portion of pattern, the missing portion of the patternincluding at least one edge of the pattern; creating a graph evolutionrule, the rule indicating that any occurrence of the child patternimplies a corresponding occurrence of the pattern, the correspondingoccurrence of the pattern being formed with the addition of the portionmissing from the child pattern at a time indicated by the missing edge'stemporal label.

In accordance with one or more embodiments, a system is provided thatcomprises one or more computing devices configured to providefunctionality in accordance with such embodiments. In accordance withone or more embodiments, functionality is embodied in steps of a methodperformed by at least one computing device. In accordance with one ormore embodiments, program code to implement functionality in accordancewith one or more such embodiments is embodied in, by and/or on acomputer-readable medium.

DRAWINGS

The above-mentioned features and objects of the present disclosure willbecome more apparent with reference to the following description takenin conjunction with the accompanying drawings wherein like referencenumerals denote like elements and in which:

FIG. 1 provides exemplary representations of graph portions inaccordance with one or more embodiments of the disclosure.

FIG. 2 illustrates a support for an exemplary graph in accordance withone or more embodiments of the disclosure.

FIG. 3 shows support for another pattern in accordance with one or moreembodiments of the present disclosure.

FIG. 4 provides an example of a head pattern and candidate body patternsin accordance with one or more embodiments of the present disclosure.

FIG. 5 provides an example of a rule generation and usage process flowin accordance with one or more embodiments of the present disclosure.

FIG. 6 provides a rule generation process flow in accordance with one ormore embodiments of the present disclosure.

FIG. 7 provides an overview of an exemplary system in accordance withone or more embodiments of the present disclosure.

FIG. 8 provides an example of a block diagram illustrating an internalarchitecture of a computing device in accordance with one or moreembodiments of the present disclosure.

DETAILED DESCRIPTION

In general, the present disclosure includes a graph evolution rulegeneration system, method and architecture.

Certain embodiments of the present disclosure will now be discussed withreference to the aforementioned figures, wherein like reference numeralsrefer to like components. Embodiments disclose herein may be describedwith reference to a social network; however, it should be apparent thatthe embodiments are not limited to a social network, and that theembodiments can be applied to any network that can change over time.

In accordance with one or more embodiments, a network is expressed interms of a graph, which comprises nodes and edges. Using a moreformalistic expression, a graph, G, over a set of nodes, V, and edges,E, with a labeling function used to assign labels to nodes and edges,can be expressed as:

G=(V,E, λ) is used to denote a graph G over a set of nodes V and a setof edges, E⊂V×V, with a labeling function λ:V∪E→Σ, assigning labels tonodes and edges from an alphabet Σ. In accordance with one or moreembodiment, each edge is defined by an ordered pair of nodes, u, v, fromthe set of nodes V.

In accordance with at least one embodiment, a label represents aproperty. A label can change over time, but is not likely to do so. Byway of some non-limiting examples, in a social network where nodesrepresent members of the social network and edges represent a connectionbetween two members, node properties may be gender, country, college,etc., and an edge property can represent a information, such as type,for the connection.

In accordance with one or more embodiments, a network can be representedin a form of an undirected graph, and the evolution of the graph overtime can be conceptually represented by a series of graphs G₁, . . .,G_(T) , with time, t, from 1 to T. At a given time, t, the graph,G_(t), can be expressed as G_(t)=(V_(t), E_(t)), with V_(t) being a setof nodes that is a subset of V and E_(t) being a set of edges that is asubset of E. In accordance with one or more embodiments, G₁,. . . ,G_(T)each represent a different snapshot of the same network. While it ispossible for one or more nodes or edges to be deleted over time (e.g., amember or a connection between two members of a social network isdeleted) it is more likely that nodes or edges will be added (e.g., amember is added and/or connection between two members of a socialnetwork is formed) to the graph over time.

In accordance with one or more embodiments, patterns are mined using acombined dataset, which comprises a single undirected graph, G, obtainedby collapsing, or combining, all the snapshots G₁, . . . ,G_(T), witheach edge in graph G time-stamped to indicate a time for its firstappearance. In accordance with one or more such embodiments, graph, G,is a time-evolving graph, which can be represented as G=(V, E, t, λ),with t corresponding to a time stamp assigned to each edge in the set ofedges, E.

FIG. 1 provides exemplary representations of graph portions inaccordance with one or more embodiments of the disclosure. The graphportions depicted in FIG. 1, and labeled 100 and 110, are patterns, orsubgraphs identified in a graph, G, obtained by merging multiplesnapshots of a network. Patterns 110 and 110 comprise nodes 102 andedges 104, 106 and 108, which connect nodes 102. Each edge has a label,which identifies a time, in absolute or relative terms, associated withthe edge, e.g., a time at which the edge between the two nodes 102 isformed. In accordance with one or more embodiments, a relative time isdetermined from an absolute time at which an edge is formed. Inaccordance with one or more embodiments, an absolute time can be anabsolute time of the snapshot taken of the network, and a relative timecan be a delta, Δ, representing a time gap between a pattern's initialedge and the absolute time of the edge in the merged graph to which thepattern's initial edge is mapped. Each of the edges 104, 106 and 108 hasa label, which identifies a time for the edge. In the example, the labelis in relative time.

With reference to subgraph 100, edge 104 is the first of the edgesformed relative to the other edges in the subgraph 100. Edge 104represents a connection between two of the nodes 102. Edge 106 is formedafter edge 104 and before edges 108. Similarly, with reference tosubgraph 110, edge 104 is the first edge formed in the subgraph,followed by edge 106 and edges 108. Each relative time label on edges104, 106 and 108 represent some increment of A.

In accordance with one or more embodiments, each of subgraphs 100 and110 can represent a pattern. In accordance with one or more suchembodiments, a pattern, P, of a time-evolving graph, G, is a subgraph ofG that in addition to matching edges of G also matches their timestamps, and if present, the properties on the nodes and edges of G.

Relative-Time and Absolute-Time Patterns

Assuming that G=(V, E, t, λ) and P=(V_(P),E_(P), t_(P), k_(P)) aregraphs, where G is a time-evolving dataset obtained by merging snapshotsof the network and P is a pattern, which is a subgraph of G, anoccurrence of P in G can be mapped to G. The mapping can be expressed asφ: V_(P)→V, where φ functionally maps a connected pair of nodes, u andv, in P to a connected pair of nodes in G such that for each mapped edgeand corresponding pair, u,v, in a pattern P the following definition,Definition 1, holds true:

(i) (u,v)∈E_(P) it is (φ(u), φ(v))∈E, e.g., the edge between u and v,represented as (u, v), which is an edge in E_(P), can be mapped to anedge, represented as (φ(u), φ(v)), in E;

(ii) (u,v)∈E_(P) it is t(φ(u), φ(v))=t(u,v); e.g., the edge, (u, v), inE_(P) has a time label, t(u,v), which is equivalent to the time labelt(φ(u), φ(v)) of the edge in E identified on (i); and

(iii) λ_(P)(v)=λ(φ(v))̂λ_(P)((u, v))=λ(φ(u), φ(v))), e.g., label(s)assigned to a node, v, in a pattern, P, is/are equivalent to thelabel(s) assigned to the mapped node, v, in the merged graph, G, and thelabel(s) assigned to the edge (u,v) in P is/are equivalent to thelabel(s) of the mapped edge in G.

In effect, according to at least one embodiment, a pattern, P, of nodesand edges, together with any labels associated with the nodes and edgesof P, maps to at least a portion of graph G. Labels that are in additionto the time label can be optional. In case no labels are present foredges or nodes, the last condition (iii) can be ignored.

In accordance with one or more alternative embodiments, a time label canrepresent a relative time, which can be determined using absolute times.Advantageously, a pattern based on relative times can represent morethan one subgraph of graph G; a pattern based on relative times providesa mechanism for generalizing over more than one subgraph of G. By usingrelative times, identified patterns can refer to patterns in more thanone of the snapshots G₁,. . . , G_(T) combined to form G. Stated anotherway, two subgraphs having the same relative time labels can match apattern that uses relative time labels regardless of whether theabsolute time labels of the two subgraphs match. A relative-time patterncan be considered to be a generalization of one or more absolute-timepatterns.

Referring again to FIG. 1, assume that each node in each of patterns 100and 110 represent an author, and each edge represents a collaborationbetween two authors. Pattern 100 can be used to predict that given thecollaboration at time 0 represented by edge 104, edge 106, whichrepresents a collaboration between two other authors, is likely to beformed at time 1, relative to time 0; and that yet anothercollaboration, which is subsequent to the first and secondcollaborations is likely to occur, at time 2, between an author from thefirst and second collaborations. One primary aspect of the pattern isthe fact that two distinct pairs of connected authors, one collaborationcreated at time 0, and one at time 1, are later (at time 2) connected bya collaboration involving one author from each pair, plus a thirdauthor. Using relative-time patterns, it is possible to account for anoccurrence of that event even if it was taking place at relative times,say, 16, 17 and 18.

To accommodate a relative-time pattern, condition (ii) of Definition 1can be modified to yield a second definition, Definition 2, whichincludes conditions (i) and (iii) of Definition 1 and modifies condition(ii) of Definition 1, as follows:

(ii) (u,v)∈E_(P) it is t(φ(u), φ(v))=t(u,v)+Δ; e.g., the edge, (u, v),in E_(P) has a time label, t(u,v)+Δ, which is equivalent to the timelabel t(φ(u), φ(v)) of the edge in E identified on (i), where Δ is atime increment in a relative time set, R.

Using the modified definition, naturally forming equivalence classes ofstructurally isomorphic relative time patterns, which differ only by aconstant on their edge time-stamps, can be obtained. Redundancies in thesearch space of all relative time patterns can be avoided by picking arepresentative pattern for each equivalence class, e.g., a pattern wherethe lowest time-stamp is zero.

Support

In accordance with one or more embodiments, a pattern has an associatedsupport measurement. The support for a pattern P in a graph G can bedetermined to be the total number of occurrences of the pattern in G.This approach to determining a support for a pattern is notanti-monotonic, however. In accordance with one or more embodiments,support for a pattern is anti-monotonic if the support of the pattern isnever larger than the support of a subpattern, or child subgraph, of thepattern. By adopting an approach in which a pattern's support isanti-monotonic, computation is feasible; processing and memory usage forgraph mining, for example, is reasonable. In accordance with one or moreembodiments, one technique that can be used to ensure anti-monotonicityof a pattern is to make a determination of a support for a pattern P ingraph G based on a number of unique nodes in the graph G to which a nodeof the pattern can be mapped. In so doing, it is possible to ensure thatthe support for a child subgraph of the pattern is at least equal to thesupport for the pattern. In accordance with one or more embodiments, aminimum image-based support is used, as described herein and in thepaper Bjorn Bringmann and Siegfried Nijssen, entitled “What Is Frequentin a Single Graph?” Advances in Discover and Data Mining, 12^(th)Pacific-Asia Conference, PAKDD2008, Osaka, Japan, May 20-23, 2008, whichis incorporated herein in its entirety.

FIG. 2 illustrates a support for an exemplary graph in accordance withone or more embodiments of the disclosure. In the example of FIG. 2, agraph, titled “Host Graph, has nine nodes, numbered 1 to 9. Shading isused to indicate comparable nodes for purposes of determining a patternin this example. As shown in the example, white nodes 1, 8 and 9 arecomparable, light gray nodes 2, 4 and 6 are comparable, and dark graynodes 3, 5 and 7 are comparable. A pattern exists in the host graph of awhite node, dark gray node, light gray node, and white node. The arrowlabeled “A” shows a graph traversal from node 8 (a white node), to node5 (a darker gray node), to node 2 (a light gray node) to node 1 (a whitenode). Traversal “A” can be considered an occurrence of the pattern inthe host graph. The first column under “Embeddings”, i.e., the columnlabeled “A” shows the nodes in occurrence “A” of the pattern. Thecolumns labeled “B” and “C” show the nodes in occurrences “B” and “C”(respectively) of the pattern. In the example, there are threeoccurrences of the pattern. However, while the nodes in “A” and “C” areunique, the nodes in “A” and “C” are not unique, since the last node inoccurrence “A” and “C” is the same node 1. Stated alternatively, in theexample, each of the upper three nodes of the pattern can be mapped tothree nodes, which are unique for each occurrence of the pattern, butthe lower white node of the pattern can only be mapped to two uniquenodes. In accordance with one or more embodiments, the support for thepattern shown in FIG. 2 is assigned a value equal to two, or the numberof occurrences of the pattern that have unique nodes at each node in thepattern. Since the lower white note in the pattern is not unique inoccurrences “A” and “C,” one of the occurrences is not counted indetermining the support for the pattern, in accordance with one or moresuch embodiments. The support of the pattern assigned a value of 2, eventhough the number of occurrences of the pattern is three, since two ofthe occurrences share a node, e.g., the lower white node, of thepattern.

FIG. 3 shows support for another pattern in accordance with one or moreembodiments of the present disclosure. In the example, patterns 310 andpattern 320 are found in graph 300. Patterns 310 and 320 are subgraphsof graph 300, and pattern 320 is a subgraph of pattern 310. To satisfyanti-monotonicity, the support for pattern 320 should be at least equalto the support for pattern 310. There are two occurrences, occurrences“A” and “B”, of pattern 310 in graph 300, and one occurrence, occurrence“C”, of pattern 320 in graph 300. While the total number of occurrencesof pattern 310 is intuitively a meaningful measure, it is notanti-monotonic, since the number of occurrences in graph 300 of pattern310 is greater than the number of occurrences of pattern 320, a subgraphof pattern 310. Stated another way, the number of occurrences in graph300 of pattern 320 is 1, while the number of occurrences of itssupergraph, pattern 310, is 2, thus violating anti-monotonicity.

In accordance with at least one embodiment, a support measurement, whichdoes not require solving a maximum independent set problem, is used. Thesupport measurement is based on the number of unique nodes in the graphG=(V_(G), E_(G)) that a node of the pattern P=(V_(P), E_(P)) is mappedto, and can be defined using the following exemplary definition,Definition 3:

${\sigma \left( {P,G} \right)} = {\min\limits_{v \in V_{p}}{\left\{ {{\phi_{i}(v)}\text{:}\mspace{14mu} \phi_{i}\mspace{14mu} {is}\mspace{14mu} {an}\mspace{14mu} {occurrence}\mspace{14mu} {of}\mspace{14mu} P\mspace{14mu} {in}\mspace{14mu} G} \right\} }}$

The above approach is advantageous to a measurement that relies onsolving a maximum independent set problem, MIS, which is NP-complete,for at least the reason that it is computationally easier to calculatesince it does not require the computation of all possible occurrences ofa pattern in a graph. Additionally, it does not require solving amaximal independent set problem for each candidate pattern. As a furtheradvantage, the above approach is theoretically an upper bound foroverlap-based approaches, so that the support according to the abovemethod is closer to a real number of occurrences in the graph.

Graph Evolution Rule (GER) and Confidence

Support of a pattern provides insight into how often such an event mayhappen compared to other changes. Confidence provides a likelihood for agiven sequence of steps. In accordance with one or more embodiments,confidence information is assigned to a graph evolution rule, andrepresents a level, or degree, of confidence, that a pattern exists, orwill exist, given an existence of a child subgraph of the patternidentified by the rule. A pattern can be decomposed into particularsteps and a confidence can be determined for each transition. Each stepcan be represented by a graph evolution rule body→(implies) head, whereboth the body and head are patterns.

In accordance with one or more embodiments, a head pattern is decomposedby discarding all of the edges from the last time step in the headpattern, with the resulting pattern being the body. A GER definition,Definition 4, can be expressed in exemplary formal terms, such thatgiven a pattern head, P_(H), a body P_(B):

E_(B)={e∈E_(H)|t(e)<max_(e)*∈_(EH)(t(e*))} and V_(B)={v∈V_(H)|deg(v,E_(B))>0}, where deg(v, E_(B)) denotes the degree of v (where degreerepresents the number of edges emanating from v) with respect to theedges in E_(B). In accordance with one or more embodiments, P_(B) mustto be connected, and the support of a GER is the support of its head.

The above definition of a body yields a unique body for each head, and aunique confidence value for each head. As such, a rule can be express bythe head, without a need to include the body in the rule. By way of anon-limiting example, such a rule can be used to determine the bodyportion using the above definition.

FIG. 4 provides an example of a head pattern and candidate body patternsin accordance with one or more embodiments of the present disclosure. Inthe example, pattern 400 can be decomposed into possible connectedsubgraphs, or child, patterns 401-407. An occurrence of 400 implies anoccurrence of all of sub-patterns, or subgraphs, 401-407. Each ofsub-patterns 401-407 can be considered a candidate body in order to forma graph evolution rule with pattern 400 as head. In accordance with oneor more embodiments, sub-patterns 401 and 403-406 can be discarded basedon a determination that these sub-patterns fail to describe an edgeemerging in the future. Furthermore, sub-pattern 407 can be eliminatedbased on a desire to use a small time step to allow for a highergranularity. The remaining sub-pattern, sub-pattern 402, is thesub-pattern determined by decomposing pattern 400 to remove the lasttime step. To further illustrate, pattern 400 includes four edges, onehaving a time step of 0, two having a time step of 1, and one having alast time step of 2. Sub-pattern 402 is pattern 400 without the lasttime step of 2. Sub-pattern 402 comprises the sub-pattern that includesall but the last, in time, edge of pattern 400.

The approach used in accordance with at least one embodiment preventsdisconnected graphs as a body, which addresses a lack of a supportdefinition for a disconnected graph. As a result, some frequent patternsmay not be decomposed into graph evolution rules. With reference tograph 100 in FIG. 1, after removing all edges with the last, e.g.,highest, time stamp, which results in edges 108 with a label equal to 2being removed, and discarding disconnected nodes, e.g., discarding theupper-most node 102 connected to the graph by edges 108, the graph thatremains is a disconnected pattern containing two disconnectedcomponents, e.g., a first component that contains nodes 102 connected byedge 106 with label 1, and a second component containing nodes 102connected by edge 104 with label 0. In accordance with at least oneembodiment, since the 1-edge graphs are disconnected support for thedisconnected pattern need not be determined, and graph 100 is notdecomposed into a GER. With reference to graph 110 of FIG. 1, removal ofthe last time step, e.g., removal of edges 108, and removal of the node102 connected to the graph via an edge 108 yields a connectedsub-pattern, e.g., nodes 102 connected by edges 104 and 106. Inaccordance with at least one embodiment, the resulting sub-pattern canbe used as the body for graph 110 in a GER.

As discussed above, a GER can be represented explicitly by identifyingboth the body and head patterns, as described above, or implicitly byidentifying the head. As discussed above, the body of the rule can beobtained easily by removing the edges with the highest step and anynodes disconnected from the head, e.g., nodes disconnected as a resultof the edge removal. By way of a non-limiting example, a GER can beexpressed simply as graph 110.

In accordance with at least one embodiment, a confidence score for a GERis determined as a ratio of the support for the head of the GER to thesupport for the body of the GER. The ratio can be expressed as:Support_(H) /Support_(B). With the support being anti-monotonic, thesupport for the body will be at least as great as the support for thehead, which yields a confidence value between zero and one.

Graph Evolution Rule Mining (GERM)

In accordance with one or more embodiments, frequent, connectedpatterns, or graphs are identified in a dataset, which contains a graphobtained by merging multiple graphs representing snapshots of a network.The identified patterns are then used to generate GERs for the network.In accordance with at least one embodiment, a depth-first searchstrategy is used to mine the graph to identify patterns in the graph. ADFS traversal of the search space leads to very low memory requirements.In performed tests, memory consumption was negligible. GERMS is anadaptation of an algorithm, gSpan described in the article entitled“gSpan: Graph-Based Substructure Pattern Mining”, by Xifeng Xan andJiawei Han, (Expanded Version, UIUC Technical Report,UIUCDCS-R-2002-2296), which is incorporated herein by reference. Theadaptation is used to mine a single graph obtained by merging temporalsnapshots of a network for frequent, connected subgraphs in the graph,which subgraphs are analyzed to identify patterns, each of which can berepresentative of an equivalence class of structurally isomorphicrelative time patterns that differ by a constant on the edge time steps,or time stamps.

The gSpan algorithm identifies frequently-occurring 1-edge graphs, eachof which comprises two nodes and an edge connecting the two nodes, in aset of graphs, GS. More particularly, gSpan removes infrequent verticesand edges, relabels the remaining vertices and edges in descendingfrequency, and uses the highest frequency DFS code as a starting nodefor its minimum DFS code canonical form. The frequently-occurring 1-edgesubgraphs are used to discover all of the possible child subgraphs thatgrow from the 1-edge subgraph, e.g., 1-edge frequent subgraph is grownto one or more 2-edge frequent subgraphs, which are grown to one or more3-edge frequent subgraphs. The gSpan algorithm uses a DFS code thatconsists of a 5-tuple to designate an edge, i.e., (i, j,l_((i,j)),l_(j)), where i and j designate the nodes, or vertices, l_(i)designates the label for vertex i, l_(j) designates the vertex label forvertex j, and l_((i,j)) designates the label for the edge. An initialset of steps in the gSpan algorithm sorts the labels of the vertices andedges in the set of graphs by their frequency, removes infrequentvertices and edges based on their determined frequencies, relabels theremaining vertices and edges in descending frequency, sorts the frequent1-edge graphs remaining in the GS in lexicographic order based on the5-tuple. For example, a 5-tuple of (0, 1, A, a, A)<the 5-tuple (0, 1, A,a, B)< . . . .

The gSpan has a procedure, Subgraph Mining, to grow child subgraphsrooted by a 1-edge frequent subgraph. The procedure mines multiplegraphs in a set, GS, for the child subgraphs. The procedure is executed,or run, recursively. In each recursive run, the procedure grows one edgefrom a node s, which initially is a node of the 1-edge subgraph andthereafter is a node of a child subgraph of the 1-edge subgraph. Therecursion follows a depth first traversal, such that a minimum DFS codeof previously-discovered subgraphs are less than the minimum DFS code oflater discovered ones. Initially, the procedure prunes duplicatesubgraphs and all of their descendants. If s is the minimum DFS code ofthe graph it represents, the procedure adds s to its frequent subgraphset, and then generates all potential children with one edge growth, andrecursively runs the procedure on each child whose support satisfies aminimum support threshold. An enumeration procedure locates s in all ofthe graphs in the graph dataset, GS, and counts the occurrences of allof the children of s over all the graphs GS.

Like gSpan, GERM mines using a DFS approach. In contrast to gSpan'smining a set of graphs consisting of multiple graphs, however, GERMmines a single graph, which is obtained by merging graphs that aretemporal snapshots of a network. Thus, GERM extracts patterns from asingle graph. In further contrast, GERM has edge labels that represent atime of the edge. Furthermore, GERM replaces gSpan's supportcalculation, which amounts to a frequency of occurrence of a subgraph(e.g., the number of graphs in the set in which the subgraph occurs orthe number of occurrences of the subgraph in the set of graphs) with aminimum image-based support calculation, which is anti-monotonic, asdescribed herein. The following provides some exemplary pseudo code ofsteps of a Subgraph_Mining procedure used by GERM in accordance with oneor more disclosed embodiments:

Algorithm 1 - SubgraphMining(GS, S, s) 1: if s ≠ min(s) then return //using canonical form with lowest time stamp 2: S ← S ∪ s 3: generate alls′ potential children with one edge growth 4: Enumerate(s) 5: for all c,c is s′ child do 6: // using definition of support based on relative orabsolute time 7: if support(c) ≧ minSupp then 8: s ← c 9:SubgraphMining(GS, S, s)

One of the key elements in gSpan is the use of the minimum DFS code,which is a canonical form introduced to avoid multiple generations ofthe same pattern. In contrast gSpan's using a minimum DFS code thatrepresents the highest-frequency 1-edge graph and selecting the minimumDFS code for s, GERM's canonical form includes a time stamp, and selectsa 1-edge graph that has the lowest time stamp. As discussed hereinabove,embodiments of the present disclosure identify one representativepattern per equivalence class; e.g., one pattern with the lowest timestamp being zero, which represents t or an increment oft, e.g., t+Δ,t+2Δ, etc. This is achieved by modifying the canonical form such thatthe first edge in the canonical form is always the one with the lowesttime stamp, e.g., absolute time stamp, as compared to gSpan, in whichthe highest label is used as a starting node for the canonical form. Anypattern grown from such a pattern with the modified canonical form willhave the same lowest time stamp, which is set to zero, based on a simpleconstraint on the first edge. This ensures that only one pattern perequivalence class is extracted, which dramatically increases performanceand eliminates redundancy in the output.

In accordance with one or more embodiments, the graph obtained bymerging snapshots of a network uses edge labels in absolute time, and apattern uses relative time, with an initial edge having a relative timeof zero. In accordance with one or more such embodiments, when matchinga pattern to the merged graph, the value of Δ, which represents a timegap between the pattern and the merged graph is a fixed value. A matchbetween the pattern and the merged graph exists where all of theremaining edges adhere to this value of Δ, or increments of Δ. If allthe edges match with the Δ set when matching the first edge, the patternis discovered to match the merged graph with that value of Δ.

Large graphs and high degrees give rise to increased computationalcomplexity of the search. In particular, having nodes with large degreeincreases the possible combinations that have to be evaluated for eachsubgraph-isomorphism test. In accordance with one or more embodiments,large graphs, e.g., graphs which have several nodes and high degreenodes, or nodes with a high number of edges, are managed using a with auser-defined constraint specifying the maximum number of edges in apattern. Typically, applications of frequent subgraph mining in thetransactional setting, such as biology and chemistry, the graphs aretypically of small size and are not high-degree nodes. The user-definedconstraint specifies the maximum number of edges in a pattern. Thisconstraint more efficiently deals with the DFS strategy by reducing thesearch space.

FIG. 5 provides an example of a rule generation and usage process flowin accordance with one or more embodiments of the present disclosure. Inaccordance with one or more embodiments, the process flow can beimplemented from program code executed by one or more computing devices.FIG. 7 provides an example of a computing system, which comprising oneor more computing device, implementing functionality in accordance withone or more embodiments of the present disclosure.

At step 502, a plurality of snapshots of a network, such as withoutlimitation a social network, are merged to generate a combinedrepresentation of the network's evolution over time. In accordance withat least one embodiment and with reference to FIG. 7, each snapshot isin a form of a graph, which represents the network at a time that thesnapshot of the network is taken. By way of a non-limiting example, thenetwork can be in graph form, and a snapshot can be taken by saving acopy of the network graph. In accordance with at least one embodiment, asnapshot is taken at time t and at increments defined by Δ, which can beany increment of time, e.g., minute, hour, day, week, etc. By way of anon-limiting example, the increment can be based on the degree to whichthe network changes over time, which can be determined using GERsgenerated for the network, or another network.

In accordance with one or more embodiments, the merged graph has a timeproperty, an absolute time property, as a label associated with eachedge, which identifies a time of a connection between two nodes, e.g.,the time of the snapshot in which the edge first appears. By way of somenon-limiting examples, an edge label in absolute time can be t, t+Δ,t+2Δ, . . . , t+nΔ, and in relative time, the time label can be 0, 1, 2,etc., or the multiplier applied to Δ. In accordance with one or moreembodiments, a minimum support threshold can be based on the Δ, e.g., agreater threshold is used for a Δ of month(s) as compared to a Δ ofweek(s). In accordance with at least one embodiment, step 502 isperformed by a graph merging component, graph merger 702, of system 700.

Referring again to FIG. 5, the merged graph is analyzed to identify aplurality of subgraphs of the merged graph. As discussed above, inaccordance with at least one embodiment, a GERM procedure is used togrow 1-edge subgraphs identified in the merged graph to identifysubgraphs. The GERM procedure identifies equivalence classes ofstructurally isomorphic relative time patterns which differ by aconstant on their edge time stamps. In accordance with one or moreembodiments, the pattern selected for each equivalence class representsthe equivalence class and is the pattern where the lowest time stamp isset to zero. As discussed above, in accordance with at least oneembodiment, the first edge, e.g., the edge in the 1-edge subgraph, usedby GERM to grow a subgraph is the edge with the lowest time stamp. Thisis in contrast to gSpan, where the highest-frequency label is used as astarting node to grow a subgraph. In accordance with one or moreembodiments, step 504 is performed by component 704 (of FIG. 7), whichimplements GERM.

At step 506 of FIG. 5, GERs are generated from the subgraphs of themerged graph identified in step 504. Briefly, patterns are identifiedfrom the identified subgraphs, a child subgraph of an identified patternis selected, a rule is formed from the identified pattern and child, anda confidence score is generated for the rule using the support for thepattern and the support for the child. In accordance with one or moreembodiments of the present disclosure, step 506 is implemented by a rulegenerator 706 component of system 700.

FIG. 6 provides a rule generation process flow in accordance with one ormore embodiments of the present disclosure. In accordance with one ormore such embodiments, the process flow can be implemented by rulegenerator 706 of system 700.

At step 602, a determination is made whether or not all of the patternsubgraphs identified in step 504 of FIG. 5 have been processed. If so,processing ends. If not, processing continues at step 604 to get thefirst, or next, pattern to be processed. At step 606, a child subgraphidentified by GERM is selected for the pattern. By way of a non-limitingexample, the selected child subgraph is a subgraph that includes all butthe last, in time, edge of the current pattern.

In accordance with one or more embodiments, since the support for thecurrent pattern is anti-monotonic, the support for the selected childsubgraph in the merged graph is at least equal to the current pattern.

Processing continues at step 608 to create a GER, e.g., body→(implies)head, using the selected child subgraph as the body and the currentpattern as the head. At step 610, a support equal to the support of thepattern is assigned for the generated GER. At step 612, a confidencescore is assigned to the GER, which is a ratio of the pattern's supportto the child's support.

Typically, the number of edge deletions in a network, e.g., a socialnetwork, is so small to be negligible when analyzing a temporalevolution of the network. In accordance with one or more embodiments,deletions can be accommodated by making a modification to a matchingoperator to handle edge deletions. In the following example, it isassumed that an edge can appear and disappear once. The extensionconsiders two time-stamps t_(i) (time of insertion) and t_(D) (time ofdeletion) on each edge instead of a single time t. Condition (ii) ofdefinitions 1 and 2 becomes:

(ii) for each edge, (u, v), in E_(P), t_(I)(φ(u), φ(v))=t_(I)(u,v)+Δ andt_(D)(φ(u), φ(v))=t_(D)(u,v)+Δ.

In accordance with one or more embodiments of the present disclosure,one or more computing devices are configured to comprise functionalitydescribed herein. The computing device can be, without limitation, aserver, personal computer, personal digital assistant (PDA), wirelessdevice, cell phone, internet appliance, media player, home theatersystem, and media center, or the like. For the purposes of thisdisclosure a computing device includes a processor and memory forstoring and executing program code, data and software, and may beprovided with an operating system that allows the execution of softwareapplications in order to manipulate data. A computing device can includeone or more processors, memory, a removable media reader, networkinterface, display and interface, and one or more input devices, e.g.,keyboard, keypad, mouse, etc. and input device interface, for example.One skilled in the art will recognize that the computing device may beconfigured in many different ways and implemented using many differentcombinations of hardware, software, or firmware.

In an embodiment, the GERs are generated for a network that is definedbased on information obtained via computing devices interconnected via acomputer network. In accordance with one or more embodiments, thecomputer network may be the Internet, an intranet (a private version ofthe Internet), or any other type of network. An intranet is a computernetwork allowing data transfer between computing devices on the network.Such a network may comprise personal computers, mainframes, servers,network-enabled hard drives, and any other computing device capable ofconnecting to other computing devices via an intranet. An intranet usesthe same Internet protocol suit as the Internet. Two of the mostimportant elements in the suit are the transmission control protocol(TCP) and the Internet protocol (IP).

FIG. 8 is a detailed block diagram illustrating an internal architectureof a computing device, e.g., a computing device of system 700, inaccordance with one or more embodiments of the present disclosure. Asshown in FIG. 8, internal architecture 800 includes one or moreprocessing units (also referred to herein as CPUs) 812, which interfacewith at least one computer bus 802. Also interfacing with computer bus802 are computer-readable storage medium, or media, 806, networkinterface 814, memory 804, e.g., random access memory (RAM), run-timetransient memory, read only memory (ROM), etc., media disk driveinterface 808 as an interface for a drive that can read and/or write tomedia including removable media such as floppy, CD-ROM, DVD, etc. media,display interface 810 as interface for a monitor or other displaydevice, keyboard interface 816 as interface for a keyboard, pointingdevice interface 818 as an interface for a mouse or other pointingdevice, and miscellaneous other interfaces not shown individually, suchas parallel and serial port interfaces, a universal serial bus (USB)interface, and the like.

Memory 804 interfaces with computer bus 802 so as to provide informationstored in memory 804 to CPU 812 during execution of software programssuch as an operating system, application programs, device drivers, andsoftware modules that comprise program code, and/or computer-executableprocess steps, incorporating functionality described herein, e.g., oneor more of process flows described herein. CPU 812 first loadscomputer-executable process steps from storage, e.g., memory 804, fixeddisk 806, removable media drive, and/or other storage device. CPU 812can then execute the stored process steps in order to execute the loadedcomputer-executable process steps. Stored data, e.g., data stored by astorage device, can be accessed by CPU 812 during the execution ofcomputer-executable process steps.

Persistent storage, e.g., fixed disk 806, can be used to store anoperating system and one or more application programs. Persistentstorage can also be used to store device drivers, such as one or more ofa digital camera driver, monitor driver, printer driver, scanner driver,or other device drivers, web pages, content files, playlists and otherfiles. Persistent storage can further include program modules and datafiles used to implement one or more embodiments of the presentdisclosure, e.g., listing selection module(s), targeting informationcollection module(s), and listing notification module(s), thefunctionality and use of which in the implementation of the presentdisclosure are discussed in detail herein.

For the purposes of this disclosure a computer readable medium storescomputer data, which data can include computer program code executableby a computer, in machine readable form. By way of example, and notlimitation, a computer readable medium may comprise computer storagemedia and communication media. Computer storage media includes volatileand non-volatile, removable and non-removable media implemented in anymethod or technology for storage of information such ascomputer-readable instructions, data structures, program modules orother data. Computer storage media includes, but is not limited to, RAM,ROM, EPROM, EEPROM, flash memory or other solid state memory technology,CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetictape, magnetic disk storage or other magnetic storage devices, or anyother medium which can be used to store the desired information andwhich can be accessed by the computer.

Those skilled in the art will recognize that the methods and systems ofthe present disclosure may be implemented in many manners and as suchare not to be limited by the foregoing exemplary embodiments andexamples. In other words, functional elements being performed by singleor multiple components, in various combinations of hardware and softwareor firmware, and individual functions, may be distributed among softwareapplications at either the client or server or both. In this regard, anynumber of the features of the different embodiments described herein maybe combined into single or multiple embodiments, and alternateembodiments having fewer than, or more than, all of the featuresdescribed herein are possible. Functionality may also be, in whole or inpart, distributed among multiple components, in manners now known or tobecome known. Thus, myriad software/hardware/firmware combinations arepossible in achieving the functions, features, interfaces andpreferences described herein. Moreover, the scope of the presentdisclosure covers conventionally known manners for carrying out thedescribed features and functions and interfaces, as well as thosevariations and modifications that may be made to the hardware orsoftware or firmware components described herein as would be understoodby those skilled in the art now and hereafter.

While the system and method have been described in terms of one or moreembodiments, it is to be understood that the disclosure need not belimited to the disclosed embodiments. It is intended to cover variousmodifications and similar arrangements included within the spirit andscope of the claims, the scope of which should be accorded the broadestinterpretation so as to encompass all such modifications and similarstructures. The present disclosure includes any and all embodiments ofthe following claims.

1-24. (canceled)
 25. A method comprising: collecting, by at least oneprocessing unit, multiple graphs corresponding to a network, the networkevolving over time, each graph representing a snapshot reflecting astate of the network; forming, by the at least one processing unit, onegraph by merging the multiple graphs representing the multiple snapshotsof the network; mining, by the at least one processing unit, the formedgraph to identify multiple patterns, each pattern being a subgraph inthe formed graph, each pattern having an associated support; selecting,by the at least one processing unit, a pattern from the identifiedpatterns; identifying, by the at least one processing unit, a childpattern of the selected pattern, the identified child pattern having asupport that is at least equal to the support of the selected pattern;creating, by the at least one processing unit, a graph evolution rule,the rule indicating that any occurrence of the child pattern implies acorresponding occurrence of the selected pattern; and assigning, by theat least one processing unit, a support to the graph evolution rule, theassigned support being equal to the support of the selected pattern. 26.The method of claim 25, wherein the formed graph by merging the multiplegraphs comprising a set of nodes and a set of edges, each edgeconnecting two nodes from the set of nodes and having a temporal label.27. The method of claim 26, wherein the child pattern missing a portionof the selected pattern, the missing portion of the selected patternincluding at least one edge of the selected pattern.
 28. The method ofclaim 27, wherein the corresponding occurrence of the selected patternbeing formed with the addition of the portion missing from the childpattern at a time indicated by a missing edge's temporal label.
 29. Themethod of claim 26, wherein the temporal label in the formed graph isabsolute time and the missing edge's temporal label is relative time.30. The method of claim 26, wherein the temporal label of an initialedge of the selected pattern and the child pattern is a relative time ofzero.
 31. The method of claim 27, wherein the at least one edge of theselected pattern of the missing portion of the selected pattern has atemporal label with the highest value in the selected pattern.
 32. Themethod of claim 25, wherein the support of the selected pattern is aminimum possible number of mappings of a node of the selected pattern inthe formed graph.
 33. The method of claim 25, further comprising:assigning, by the at least one processing unit, a confidence to thegraph evolution rule, the assigned confidence is equal to a ratio of thesupport of the selected pattern to the support of the child pattern. 34.The method of claim 26, identifying the child pattern of the selectedpattern further comprising: identifying, by the at least one processingunit, the child pattern of the selected pattern that includes all butone of the edges of the selected pattern, the missing edge having atemporal label that has the greatest value of the temporal labelsassigned to edges of the selected pattern.
 35. A system comprising: atleast one computing device, the at least one computing device comprisinga processor and a non-transitory computer readable storage medium havingstored thereon: a graph merging component that: collects multiple graphscorresponding to a network, the network evolving over time, each graphrepresenting a snapshot reflecting a state of the network; forms onegraph by merging the multiple graphs representing the multiple snapshotsof the network; mines the formed graph to identify multiple patterns,each pattern being a subgraph in the formed graph, each pattern havingan associated support; selects a pattern from the identified patterns;identifies a child pattern of the selected pattern, the identified childpattern having a support that is at least equal to the support of theselected pattern; creates a graph evolution rule, the rule indicatingthat any occurrence of the child pattern implies a correspondingoccurrence of the selected pattern; and assigns a support to the graphevolution rule, the assigned support being equal to the support of theselected pattern.
 36. The system of claim 35, wherein the formed graphby merging the multiple graphs comprising a set of nodes and a set ofedges, each edge connecting two nodes from the set of nodes and having atemporal label.
 37. The system of claim 36, wherein the child patternmissing a portion of the selected pattern, the missing portion of theselected pattern including at least one edge of the selected pattern.38. The system of claim 37, wherein the corresponding occurrence of theselected pattern being formed with the addition of the portion missingfrom the child pattern at a time indicated by a missing edge's temporallabel.
 39. The system of claim 36, wherein the temporal label in theformed graph is absolute time and the missing edge's temporal label isrelative time.
 40. The system of claim 36, wherein the temporal label ofan initial edge of the selected pattern and the child pattern is arelative time of zero.
 41. The system of claim 37, wherein the at leastone edge of the selected pattern of the missing portion of the selectedpattern has a temporal label with the highest value in the selectedpattern.
 42. The system of claim 35, wherein the support of the selectedpattern is a minimum possible number of mappings of a node of theselected pattern in the formed graph.
 43. The system of claim 35,wherein the graph evolution rule generator assigns a confidence to thegraph evolution rule, the assigned confidence is equal to a ratio of thesupport of the selected pattern to the support of the child pattern. 44.The system of claim 36, the graph evolution rule generator identifiesthe child pattern of the selected pattern by identifying the childpattern of the selected pattern that includes all but one of the edgesof the selected pattern, the missing edge having a temporal label thathas the greatest value of the temporal labels assigned to edges of theselected pattern.
 45. A non-transitory computer-readable medium tangiblystoring thereon computer-executable process steps, the process stepscomprising: collecting multiple graphs corresponding to a network, thenetwork evolving overtime, each graph representing a snapshot reflectinga state of the network; forming one graph by merging the multiple graphsrepresenting the multiple snapshots of the network; mining the formedgraph to identify multiple patterns, each pattern being a subgraph inthe formed graph, each pattern having an associated support; selecting apattern from the identified patterns; identifying a child pattern of theselected pattern, the identified child pattern having a support that isat least equal to the support of the selected pattern; creating a graphevolution rule, the rule indicating that any occurrence of the childpattern implies a corresponding occurrence of the selected pattern; andassigning a support to the graph evolution rule, the assigned supportbeing equal to the support of the selected pattern.